Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
J Acoust Soc Am ; 155(3): 1694-1703, 2024 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-38426839

RESUMO

Cochlear implant (CI) is currently the vital technological device for assisting deaf patients in hearing sounds and greatly enhances their sound listening appreciation. Unfortunately, it performs poorly for music listening because of the insufficient number of electrodes and inaccurate identification of music features. Therefore, this study applied source separation technology with a self-adjustment function to enhance the music listening benefits for CI users. In the objective analysis method, this study showed that the results of the source-to-distortion, source-to-interference, and source-to-artifact ratios were 4.88, 5.92, and 15.28 dB, respectively, and significantly better than the Demucs baseline model. For the subjective analysis method, it scored higher than the traditional baseline method VIR6 (vocal to instrument ratio, 6 dB) by approximately 28.1 and 26.4 (out of 100) in the multi-stimulus test with hidden reference and anchor test, respectively. The experimental results showed that the proposed method can benefit CI users in identifying music in a live concert, and the personal self-fitting signal separation method had better results than any other default baselines (vocal to instrument ratio of 6 dB or vocal to instrument ratio of 0 dB) did. This finding suggests that the proposed system is a potential method for enhancing the music listening benefits for CI users.


Assuntos
Implante Coclear , Implantes Cocleares , Surdez , Aprendizado Profundo , Música , Humanos , Surdez/reabilitação , Percepção Auditiva
2.
Artigo em Inglês | MEDLINE | ID: mdl-37938964

RESUMO

Dysarthria, a speech disorder often caused by neurological damage, compromises the control of vocal muscles in patients, making their speech unclear and communication troublesome. Recently, voice-driven methods have been proposed to improve the speech intelligibility of patients with dysarthria. However, most methods require a significant representation of both the patient's and target speaker's corpus, which is problematic. This study aims to propose a data augmentation-based voice conversion (VC) system to reduce the recording burden on the speaker. We propose dysarthria voice conversion 3.1 (DVC 3.1) based on a data augmentation approach, including text-to-speech and StarGAN-VC architecture, to synthesize a large target and patient-like corpus to lower the burden of recording. An objective evaluation metric of the Google automatic speech recognition (Google ASR) system and a listening test were used to demonstrate the speech intelligibility benefits of DVC 3.1 under free-talk conditions. The DVC system without data augmentation (DVC 3.0) was used for comparison. Subjective and objective evaluation based on the experimental results indicated that the proposed DVC 3.1 system enhanced the Google ASR of two dysarthria patients by approximately [62.4%, 43.3%] and [55.9%, 57.3%] compared to unprocessed dysarthria speech and the DVC 3.0 system, respectively. Further, the proposed DVC 3.1 increased the speech intelligibility of two dysarthria patients by approximately [54.2%, 22.3%] and [63.4%, 70.1%] compared to unprocessed dysarthria speech and the DVC 3.0 system, respectively. The proposed DVC 3.1 system offers significant potential to improve the speech intelligibility performance of patients with dysarthria and enhance verbal communication quality.


Assuntos
Disartria , Voz , Humanos , Disartria/etiologia , Inteligibilidade da Fala/fisiologia , Músculos Laríngeos
3.
IEEE Trans Biomed Eng ; 70(12): 3330-3341, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37327105

RESUMO

OBJECTIVE: Although many speech enhancement (SE) algorithms have been proposed to promote speech perception in hearing-impaired patients, the conventional SE approaches that perform well under quiet and/or stationary noises fail under nonstationary noises and/or when the speaker is at a considerable distance. Therefore, the objective of this study is to overcome the limitations of the conventional speech enhancement approaches. METHOD: This study proposes a speaker-closed deep learning-based SE method together with an optical microphone to acquire and enhance the speech of a target speaker. RESULTS: The objective evaluation scores achieved by the proposed method outperformed the baseline methods by a margin of 0.21-0.27 and 0.34-0.64 in speech quality (HASQI) and speech comprehension/intelligibility (HASPI), respectively, for seven typical hearing loss types. CONCLUSION: The results suggest that the proposed method can enhance speech perception by cutting off noise from speech signals and mitigating interference caused by distance. SIGNIFICANCE: The results of this study show a potential way that can help improve the listening experience in enhancing speech quality and speech comprehension/intelligibility for hearing-impaired people.


Assuntos
Implantes Cocleares , Aprendizado Profundo , Auxiliares de Audição , Perda Auditiva , Percepção da Fala , Humanos , Inteligibilidade da Fala
4.
J Voice ; 2023 Jan 31.
Artigo em Inglês | MEDLINE | ID: mdl-36732109

RESUMO

OBJECTIVE: Doctors, nowadays, primarily use auditory-perceptual evaluation, such as the grade, roughness, breathiness, asthenia, and strain scale, to evaluate voice quality and determine the treatment. However, the results predicted by individual physicians often differ, because of subjective perceptions, and diagnosis time interval, if the patient's symptoms are hard to judge. Therefore, an accurate computerized pathological voice quality assessment system will improve the quality of assessment. METHOD: This study proposes a self_attention-based system, with a deep learning technology, named self_attention-based bidirectional long-short term memory (SA BiLSTM). Different pitches [low, normal, high], and vowels [/a/, /i/, /u/], were added into the proposed model, to make it learn how professional doctors evaluate the grade, roughness, breathiness, asthenia, and strain scale, in a high dimension view. RESULTS: The experimental results showed that the proposed system provided higher performance than the baseline system. More specifically, the macro average of the F1 score, presented as decimal, was used to compare the accuracy of classification. The (G, R, and B) of the proposed system were (0.768±0.011, 0.820±0.009, and 0.815±0.009), which is higher than the baseline systems: deep neural network (0.395±0.010, 0.312±0.019, 0.321±0.014) and convolution neural network (0.421±0.052, 0.306±0.043, 0.3250±0.032) respectively. CONCLUSIONS: The proposed system, with SA BiLSTM, pitches, and vowels, provides a more accurate way to evaluate the voice. This will be helpful for clinical voice evaluations and will improve patients' benefits from voice therapy.

5.
J Chin Med Assoc ; 86(1): 105-112, 2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36300992

RESUMO

BACKGROUND: The population of young adults who are hearing impaired increases yearly, and a device that enables convenient hearing screening could help monitor their hearing. However, background noise is a critical issue that limits the capabilities of such a device. Therefore, this study evaluated the effectiveness of commercial active noise cancellation (ANC) headphones for hearing screening applications in the presence of background noise. In particular, six confounders were used for a comprehensive evaluation. METHODS: We enrolled 12 young adults (a total of 23 ears with normal hearing) to participate in this study. A cross-sectional self-controlled study was conducted to explore the effectiveness of hearing screening in the presence of background noise, with a total of 240 test conditions (=3 ANC models × 2 ANC function statuses × 2 noise types × 5 noise levels × 4 frequencies) for each test ear. Subsequently, a linear regression model was used to prove the effectiveness of ANC headphones for hearing screening applications in the presence of background noise with six confounders. RESULTS: The experimental results showed that, on average, the ANC function of headphones can improve the effectiveness of hearing screening tasks in the presence of background noise. Specifically, the statistical analysis showed that the ANC function enabled a significant 10% improvement ( p < 0.001) compared with no ANC function. CONCLUSION: This study confirmed the effectiveness of ANC headphones for young adult hearing screening applications in the presence of background noise. Furthermore, the statistical results confirmed that as confounding variables, noise type, noise level, hearing screening frequency, ANC headphone model, and sex all affect the effectiveness of the ANC function. These findings suggest that ANC is a potential means of helping users obtain high-accuracy hearing screening results in the presence of background noise. Moreover, we present possible directions of development for ANC headphones in future studies.


Assuntos
Perda Auditiva , Ruído , Adulto Jovem , Humanos , Projetos Piloto , Estudos Transversais , Ruído/prevenção & controle , Audição
6.
Sensors (Basel) ; 22(19)2022 Sep 27.
Artigo em Inglês | MEDLINE | ID: mdl-36236430

RESUMO

With the development of active noise cancellation (ANC) technology, ANC has been used to mitigate the effects of environmental noise on audiometric results. However, objective evaluation methods supporting the accuracy of audiometry for ANC exposure to different levels of noise have not been reported. Accordingly, the audio characteristics of three different ANC headphone models were quantified under different noise conditions and the feasibility of ANC in noisy environments was investigated. Steady (pink noise) and non-steady noise (cafeteria babble noise) were used to simulate noisy environments. We compared the integrity of pure-tone signals obtained from three different ANC headphone models after processing under different noise scenarios and analyzed the degree of ANC signal correlation based on the Pearson correlation coefficient compared to pure-tone signals in quiet. The objective signal correlation results were compared with audiometric screening results to confirm the correspondence. Results revealed that ANC helped mitigate the effects of environmental noise on the measured signal and the combined ANC headset model retained the highest signal integrity. The degree of signal correlation was used as a confidence indicator for the accuracy of hearing screening in noise results. It was found that the ANC technique can be further improved for more complex noisy environments.


Assuntos
Programas de Rastreamento , Ruído , Audiometria de Tons Puros/métodos , Estudos de Viabilidade , Audição
7.
Annu Int Conf IEEE Eng Med Biol Soc ; 2022: 1972-1976, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-36086160

RESUMO

Envelope waveforms can be extracted from multiple frequency bands of a speech signal, and envelope waveforms carry important intelligibility information for human speech communication. This study aimed to investigate whether a deep learning-based model with features of temporal envelope information could synthesize an intelligible speech, and to study the effect of reducing the number (from 8 to 2 in this work) of temporal envelope information on the intelligibility of the synthesized speech. The objective evaluation metric of short-time objective intelligibility (STOI) showed that, on average, the synthesized speech of the proposed approach provided higher STOI (i.e., 0.8) scores in each test condition; and the human listening test showed that the average word correct rate of eight listeners was higher than 97.5%. These findings indicated that the proposed deep learning-based system can be a potential approach to synthesize a highly intelligible speech with limited envelope information in the future.


Assuntos
Aprendizado Profundo , Percepção da Fala , Percepção Auditiva , Humanos , Inteligibilidade da Fala , Fatores de Tempo
8.
JASA Express Lett ; 2(5): 055202, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-36154065

RESUMO

Medical masks have become necessary of late because of the COVID-19 outbreak; however, they tend to attenuate the energy of speech signals and affect speech quality. Therefore, this study proposes an optical-based microphone approach to obtain speech signals from speakers' medical masks. Experimental results showed that the optical-based microphone approach achieved better performance (85.61%) than the two baseline approaches, namely, omnidirectional (24.17%) and directional microphones (31.65%), in the case of long-distance speech and background noise. The results suggest that the optical-based microphone method is a promising approach for acquiring speech from a medical mask.


Assuntos
COVID-19 , Auxiliares de Audição , Percepção da Fala , COVID-19/prevenção & controle , Desenho de Equipamento , Humanos , Máscaras , Fala , Vibração
9.
Comput Methods Programs Biomed ; 215: 106602, 2022 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-35021138

RESUMO

BACKGROUND AND OBJECTIVE: Most dysarthric patients encounter communication problems due to unintelligible speech. Currently, there are many voice-driven systems aimed at improving their speech intelligibility; however, the intelligibility performance of these systems are affected by challenging application conditions (e.g., time variance of patient's speech and background noise). To alleviate these problems, we proposed a dysarthria voice conversion (DVC) system for dysarthric patients and investigated the benefits under challenging application conditions. METHOD: A deep learning-based voice conversion system with phonetic posteriorgram (PPG) features, called the DVC-PPG system, was proposed in this study. An objective-evaluation metric of Google automatic speech recognition (Google ASR) system and a listening test were used to demonstrate the speech intelligibility benefits of DVC-PPG under quiet and noisy test conditions; besides, the well-known voice conversion system using mel-spectrogram, DVC-Mels, was used for comparison to verify the benefits of the proposed DVC-PPG system. RESULTS: The objective-evaluation metric of Google ASR showed the average accuracy of two subjects in the duplicate and outside test conditions while the DVC-PPG system provided higher speech recognitions rate (83.2% and 67.5%) than dysarthric speech (36.5% and 26.9%) and DVC-Mels (52.9% and 33.8%) under quiet conditions. However, the DVC-PPG system provided more stable performance than the DVC-Mels under noisy test conditions. In addition, the results of the listening test showed that the speech-intelligibility performance of DVC-PPG was better than those obtained via the dysarthria speech and DVC-Mels under the duplicate and outside conditions, respectively. CONCLUSIONS: The objective-evaluation metric and listening test results showed that the recognition rate of the proposed DVC-PPG system was significantly higher than those obtained via the original dysarthric speech and DVC-Mels system. Therefore, it can be inferred from our study that the DVC-PPG system can improve the ability of dysarthric patients to communicate with people under challenging application conditions.


Assuntos
Inteligibilidade da Fala , Voz , Disartria , Humanos , Fonética , Medida da Produção da Fala
10.
J Med Internet Res ; 23(10): e25460, 2021 10 28.
Artigo em Inglês | MEDLINE | ID: mdl-34709193

RESUMO

BACKGROUND: Cochlear implant technology is a well-known approach to help deaf individuals hear speech again and can improve speech intelligibility in quiet conditions; however, it still has room for improvement in noisy conditions. More recently, it has been proven that deep learning-based noise reduction, such as noise classification and deep denoising autoencoder (NC+DDAE), can benefit the intelligibility performance of patients with cochlear implants compared to classical noise reduction algorithms. OBJECTIVE: Following the successful implementation of the NC+DDAE model in our previous study, this study aimed to propose an advanced noise reduction system using knowledge transfer technology, called NC+DDAE_T; examine the proposed NC+DDAE_T noise reduction system using objective evaluations and subjective listening tests; and investigate which layer substitution of the knowledge transfer technology in the NC+DDAE_T noise reduction system provides the best outcome. METHODS: The knowledge transfer technology was adopted to reduce the number of parameters of the NC+DDAE_T compared with the NC+DDAE. We investigated which layer should be substituted using short-time objective intelligibility and perceptual evaluation of speech quality scores as well as t-distributed stochastic neighbor embedding to visualize the features in each model layer. Moreover, we enrolled 10 cochlear implant users for listening tests to evaluate the benefits of the newly developed NC+DDAE_T. RESULTS: The experimental results showed that substituting the middle layer (ie, the second layer in this study) of the noise-independent DDAE (NI-DDAE) model achieved the best performance gain regarding short-time objective intelligibility and perceptual evaluation of speech quality scores. Therefore, the parameters of layer 3 in the NI-DDAE were chosen to be replaced, thereby establishing the NC+DDAE_T. Both objective and listening test results showed that the proposed NC+DDAE_T noise reduction system achieved similar performances compared with the previous NC+DDAE in several noisy test conditions. However, the proposed NC+DDAE_T only required a quarter of the number of parameters compared to the NC+DDAE. CONCLUSIONS: This study demonstrated that knowledge transfer technology can help reduce the number of parameters in an NC+DDAE while keeping similar performance rates. This suggests that the proposed NC+DDAE_T model may reduce the implementation costs of this noise reduction system and provide more benefits for cochlear implant users.


Assuntos
Implante Coclear , Implantes Cocleares , Percepção da Fala , Humanos , Ruído , Inteligibilidade da Fala
11.
JMIR Mhealth Uhealth ; 8(12): e16746, 2020 12 03.
Artigo em Inglês | MEDLINE | ID: mdl-33270033

RESUMO

BACKGROUND: Voice disorders mainly result from chronic overuse or abuse, particularly in occupational voice users such as teachers. Previous studies proposed a contact microphone attached to the anterior neck for ambulatory voice monitoring; however, the inconvenience associated with taping and wiring, along with the lack of real-time processing, has limited its clinical application. OBJECTIVE: This study aims to (1) propose an automatic speech detection system using wireless microphones for real-time ambulatory voice monitoring, (2) examine the detection accuracy under controlled environment and noisy conditions, and (3) report the results of the phonation ratio in practical scenarios. METHODS: We designed an adaptive threshold function to detect the presence of speech based on the energy envelope. We invited 10 teachers to participate in this study and tested the performance of the proposed automatic speech detection system regarding detection accuracy and phonation ratio. Moreover, we investigated whether the unsupervised noise reduction algorithm (ie, log minimum mean square error) can overcome the influence of environmental noise in the proposed system. RESULTS: The proposed system exhibited an average accuracy of speech detection of 89.9%, ranging from 81.0% (67,357/83,157 frames) to 95.0% (199,201/209,685 frames). Subsequent analyses revealed a phonation ratio between 44.0% (33,019/75,044 frames) and 78.0% (68,785/88,186 frames) during teaching sessions of 40-60 minutes; the durations of most of the phonation segments were less than 10 seconds. The presence of background noise reduced the accuracy of the automatic speech detection system, and an adjuvant noise reduction function could effectively improve the accuracy, especially under stable noise conditions. CONCLUSIONS: This study demonstrated an average detection accuracy of 89.9% in the proposed automatic speech detection system with wireless microphones. The preliminary results for the phonation ratio were comparable to those of previous studies. Although the wireless microphones are susceptible to background noise, an additional noise reduction function can alleviate this limitation. These results indicate that the proposed system can be applied for ambulatory voice monitoring in occupational voice users.


Assuntos
Acústica da Fala , Distúrbios da Voz , Algoritmos , Humanos , Fonação , Fala
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...